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DETAILED ACTION 

1 . This action is responsive to the following communication: Amendment filed on January 
14, 2003. 

2. Claims 1-34 are pending. Claims 1 and 34 are independent claims. 

Specification 

3. The disclosure is objected to because it contains an embedded hyperlink and/or other 
form of browser-executable code. Applicant is required to delete the embedded hyperlink and/or 
other form of browser-executable code. See MPEP § 608.01. 

The Examiner notes that this objection was made in the Office action mailed on 
November 1, 2002 (hereinafter "the previous Office action") and that applicants made 
amendments (putting hyperlinks in quotes and/or removing the "http" preface to URLs) in an 
attempt to overcome the objection. The reason for the objection is that hyperlinks frequently 
change or become obsolete, and therefore hyperlinks or URLs in any form are not appropriate in 
a patent application. If applicants wish to have material found at a given URL included in the 
record, the proper means for doing so is to print the material and submit it in an Information 
Disclosure Statement. In other cases (for example, Table 1 on pages 16-17), subject matter that 
includes hyperlinks may be disclosed in the drawings. 

Claim Rejections - 35 USC§103 

4. The text of those sections of Title 35, U.S. Code not included in this action can be found 
in a prior Office action. 

5. With respect to the rejection of each dependent claim below, the preceding rejection(s) of 
the relevant base claim(s) is/are incorporated therein 
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6. Claims 1-14, 17-19, and 26-34 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over U.S. Patent Number 5,835,905 to Pirolli et al. (hereinafter "Pirolli"), issued November 10, 
1998, filed April 9, 1997, in view of U.S. Patent Number 5,960,422 to Prasad (hereinafter 
"Prasad"), issued September 28, 1999, filed November 26, 1997. 

Regarding independent claim 1, Pirolli does not disclose establishing a plurality of 
training sets wherein each training set is associated with a category and includes training 
documents that have been classified as belonging to said associated category. However, Prasad 
does make such a disclosure. (Prasad, col. 4, lines 17-21 : "In order to use Rule-based Induction 
a set of training documents 24 is collected from each source 20. The training set 24 is created by 
a random set of documents from each source, typically about 90% of the documents with the 
remaining 10% of the documents forming a test set 26.") Further, Prasad provided motivation to 
follow his teaching by explaining that it would enable searches that were more likely to satisfy a 
user query. (Prasad, col. 2, lines 50-67.) Therefore, it would have been obvious to one of 
ordinary skill in the art to have established a plurality of training sets wherein each training set is 
associated with a category and includes training documents that have been classified as 
belonging to said associated category. 

Further, Pirolli discloses determining how strongly each document corresponds to each of 
the categories by determining similarity between each document and a set of criteria established 
for the category. (Pirolli, col. 8, lines 8-47.) Pirolli does not disclose determining similarity 
between each document and the training documents that belong to the training documents that 
belong to the training set of each category. However, one of ordinary skill in the art would have 
recognized that using sets of training documents to automatically define categories would have 
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provided the benefit of basing criteria on documents actually taken from the category which, 
Prasad explains (as noted in the preceding paragraph), would have provided more accurate 
search results. Therefore, it would have been obvious to one of ordinary skill in the art to extend 
Pirolli with Prasad's teaching of using training documents. 

Further, Pirolli discloses the step of determining similarity performed using a matrix 
representing document similarity that is derived by combining two or more measures of 
document similarity. (Pirolli, col. 11, lines 36-39: "An activation network can be represented as 
a graph defined by matrix R, where each off-diagonal element Ri j contains the strength of 
association between nodes i and j, and the diagonal contains zeros."; col. 8, lines 8-13: "In order 
to perform categorizations each Web page at the Web locality is represented by a vector of 
features constructed from the above topology, meta-information, usage statistics and paths, and 
text similarities. These Web page vectors are collected into a matrix. Such a matrix is illustrated 
in FIG. 5.") 

Regarding dependent claim 2, Pirolli discloses the measures of document similarity 
including hyperlink similarity inasmuch as Pirolli teaches that the matrix contains "inlinks, the 
number of hyperlinks that point to the item from the web locality (column 504) [and] outlinks, 
the number of hyperlinks the item contains that point to other items in the web locality (column 
505)." (Pirolli, col.8, lines 19-23; Fig. 5.) 

Regarding dependent claim 3, Pirolli discloses documents considered similar to each 
other when there is a link from one to the other, or when the two documents link to, or are linked 
to by, a set of other associated documents inasmuch as Pirolli teaches that the matrix contains 
"inlinks, the number of hyperlinks that point to the item from the web locality (column 504) 




Application/Control Number: 09/333, 1 21 Page 5 

Art Unit: 2176 

[and] outlinks, the number of hyperlinks the item contains that point to other items in the web 
locality (column 505)." (Pirolli, col.8, lines 19-23; Fig. 5.) 

Regarding dependent claim 4, Pirolli discloses certain hyperlinks having greater or 
lesser similarity weight than other hyperlinks based on other features of the links or their source 
or destination documents inasmuch as Pirolli teaches "an approach based on weighted linear 
equations that define the rules for predicting degree of category membership for each page at a 
web locality. That is, equations are of the form 

(1) Ci =WiVi + w 2 v 2 + , . . + w n v n 
for all pages i in a Web locality, where the vj are the measured features of each Web page, and 
the wj are weights." (Pirolli, col. 8, lines 41-48.) 

Regarding dependent claim 5, Pirolli discloses the measures of document similarity 
including a similarity of text of the documents. (Pirolli, Fig. 5.) 

Regarding dependent claim 6, Pirolli discloses two documents being considered similar 
based on a comparison of word vectors derived from the text of each of the two documents. 
(Pirolli, col. 7, lines 57-65 - "The token information is then used to create a document vector, 
where each component of the vector represents a word, step 403. Entries in the vector for a 
document indicate the presence or frequency of a word in the document. The steps 401-403 are 
repeated for each Web page in the Web locality. For each pair of pages, the dot product of these 
vectors is computed, step 404. The dot product . . . produces a similarity measure.") 

Regarding dependent claim 7, Pirolli discloses text similarity determined in part based 
upon weight values assigned to words of the text, and wherein certain words have greater or 
lesser weight than other words inasmuch as Pirolli teaches that "entries in the vector for a 
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document indicate the presence or frequency of a word in the document." (Pirolli, col. 7, lines 
59-61.) 

Regarding dependent claim 8, Pirolli discloses the measures of document similarity 
including user click-through similarity inasmuch as Pirolli teaches that one "kind of graph[], or 
network[], . . . used to represent strength of associations among Web pages [is] the usage paths, 
or flow of users through the locality." (Pirolli, col. 10, lines 59-60; Fig. 11.) 

Regarding dependent claim 9, Pirolli discloses documents associated by frequency of 
clicks inasmuch as Pirolli states that "[referring now to FIG. 13, for the matrix representation of 
usage path networks, an entry of an integer strength, s >=0, in column i row j, indicates the 
number of users that traversed from page i to page j." (Pirolli, col. 11, lines 30-34. See also 
Pirolli, col. 7, lines 15-18 - "From the set of paths, a vector that contains each page's frequency 
of requests is generated (i.e. a frequency vector), step 304, along with a path matrix containing 
the number of traversals from one page to another, step 305.") 

Regarding dependent claim 10, Pirolli discloses deriving measures of document 
similarity from patterns detected in user viewing of the documents inasmuch as Pirolli teaches 
use of "raw data [that] may be obtained from usage records or access logs of the web locality and 
by direct traversal of the Web pages in the Web locality" (Pirolli, col. 4, lines 57-60), and further 
states that "[t]he raw data is comprised of topology information, page meta-information, page 

frequency path information and text similarity information Usage frequency and path 

information indicate how many times a Web page has been accessed and how many times a 
traversal was made from one Web page to another." (Pirolli, col. 5, lines 2-10.) 
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Regarding dependent claim 11, as discussed above regarding dependent claim 10, Pirolli 
discloses user viewing information monitored by a web caching system and stored in a log. 

Regarding dependent claim 12, Pirolli discloses documents being considered similar 
based on frequency of viewing inasmuch as Pirolli states that "[referring now to FIG. 13, for the 
matrix representation of usage path networks, an entry of an integer strength, s >=0, in column i 
rowj, indicates the number of users that traversed from page i to page j." (Pirolli, col. 11, lines 
30-34. See also Pirolli, col. 7, lines 15-18 - "From the set of paths, a vector that contains each 
page's frequency of requests is generated (i.e. a frequency vector), step 304, along with a path 
matrix containing the number of traversals from one page to another, step 305.") 

Regarding dependent claim 13, Pirolli does not disclose measures of document 
similarity including URL similarity. However, Pirolli suggests using URL similarity as a 
measure of document similarity inasmuch as Pirolli teaches that the format and structure of 
documents' URLs as well as particular words found in documents' URLs might mean that the 
documents belong in the same category. (Pirolli, col. 9, lines 17-20, 24-28.) Therefore, it would 
have been obvious to one of ordinary skill in the art to have modified Pirolli to have used 
measures of document similarity including URL similarity. 

Regarding dependent claim 14, Pirolli does not disclose considering two documents 
similar if a URL of each document contains similar URL sub-components. However, Pirolli 
suggests considering two documents similar if a URL of each document contains similar URL 
sub-components inasmuch as Pirolli teaches that particular words found in documents' URLs 
might mean that the documents belong in the same category. (Pirolli, col. 9, lines 24-28.) 
Therefore, it would have been obvious to one of ordinary skill in the art to have modified Pirolli 
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to have considered two documents similar if a URL of each document contains similar URL sub- 
components. 

Regarding dependent claim 17, Pirolli does not disclose achieving the combination of 
two or more measures of document similarity by taking the union of each of a plurality of 
graphs, each graph describing one of the measures of document similarity, to compute a 
combined graph that describes the combined document similarity. However, Pirolli suggests 
taking such a union inasmuch as Pirolli states that "three kind of graphs, or networks, are used to 
represent strength of associations among Web pages: (1) the hypetext [sic] link topology of a 
Web locality, (2) inter-page text similarity, and (3) the usage paths, or flow of users through the 
locality. Each of these networks or graphs is represented by matrices in our spreading activation 
algorithm." (Pirolli, col. 10, lines 56-63.) Therefore, it would have been obvious to one of 
ordinary skill in the art to have modified Pirolli to have taken such a union of a plurality of 
graphs. 

Regarding dependent claim 18, Pirolli does not disclose achieving the combination of 
two or more measures of document similarity by taking the intersection of each of a plurality of 
graphs, each graph describing one of the measures of document similarity, to compute a 
combined graph that describes the combined document similarity. However, Pirolli suggests 
combining graphs inasmuch as Pirolli states that "three kind of graphs, or networks, are used to 
represent strength of associations among Web pages: (1) the hypetext [sic] link topology of a 
Web locality, (2) inter-page text similarity, and (3) the usage paths, or flow of users through the 
locality. Each of these networks or graphs is represented by matrices in our spreading activation 
algorithm." (Pirolli, col. 10, lines 56-63.) Pirolli suggests that this combination could be an 
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intersection inasmuch as Pirolli teaches that association strength can be zero, effectively meaning 
that a portion of a graph would be excluded from the combination. (Pirolli, col. 1 1 , lines 1-34.) 
Therefore, it would have been obvious to one of ordinary skill in the art to have modified Pirolli 
to have taken such an intersection of a plurality of graphs. 

Regarding dependent claim 19, insofar as that claim can be understood, Pirolli discloses 
extracting structural information from the similarity matrix to obtain new documents supported 
by the set of training documents for each category inasmuch as Pirolli teaches extracting 
information from matrix structures and using a spreading activation technique to "define the 
degree of predicted relevance of Web pages to the starting set of focus Web pages." (Pirolli, col. 
10, lines 8-35.) 

Regarding dependent claim 26, Pirolli discloses categories coming from a manually 
derived taxonomy inasmuch as Pirolli states that "for the classification of Web pages in the web 
locality, classification characteristics are provided, step 103. The classification characteristics are 
predetermined "rules" which are applied to the feature vectors of a page to determine the 
category of the page. For example, it may be desirable to have a classification of web pages as 
index types (contain primarily links to other pages) or content types (contain primarily 
information)." (Pirolli, col. 5, lines 12-19.) 

Regarding dependent claim 27, Pirolli does not disclose categories derived from logs of 
user queries. However, Pirolli does suggest such a step inasmuch as Pirolli teaches that one of 
the three general sorts of information determine the need probabilities of information in memory, 
given a current focus of attention [is] . . . past usage patterns" (Pirolli, col. 4, lines 34-35), and 
further explains that such usage patterns are found in "usage records or access logs of the web 
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locality." (Pirolli, col. 4, lines 58-59.) Therefore, it would have been obvious to one of ordinary 
skill in the art to have extended Pirolli to have derived categories from logs of user queries. 

Regarding dependent claim 28, Pirolli does not disclose creating and storing the matrix 
using columns representing documents and rows representing user sessions wherein values of 
elements of the second matrix represent interest in a document shown by a particular user in a 
particular session. However, Pirolli does teach creating and storing information about the 
number of times a document was requested within a given time period (Pirolli, col. 8, lines 24- 
25) and also suggests tracking user usage patterns (Pirolli, col. 4, lines 34-35), which suggests 
creating and storing a value that is the function of the amount of time a user spent viewing a 
document associated with a particular session. Therefore, it would have been obvious to one of 
ordinary skill in the art to have extended Pirolli to have created columns representing documents 
and rows representing user sessions wherein values represent interest in a document shown by a 
particular user in a particular session. 

Regarding dependent claim 29, Pirolli does not disclose creating and storing the matrix 
using rows representing documents and columns representing user sessions wherein values of 
elements of the second matrix represent interest in a document shown by a particular user in a 
particular session. However, Pirolli does teach creating and storing information about the 
number of times a document was requested within a given time period (Pirolli, col. 8, lines 24- 
25) and also suggests tracking user usage patterns (Pirolli, col. 4, lines 34-35), which suggests 
creating and storing a value that is the function of the amount of time a user spent viewing a 
document associated with a particular session. Therefore, it would have been obvious to one of 
ordinary skill in the art to have extended Pirolli to have created rows representing documents 
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and columns representing user sessions wherein values represent interest in a document shown 
by a particular user in a particular session. 

Regarding dependent claim 30, Pirolli does not disclose element values computed as a 
function of a time that a user has spent viewing a document associated with each element. 
However, Pirolli does teach creating and storing information about the number of times a 
document was requested within a given time period (Pirolli, col. 8, lines 24-25) and also suggests 
tracking user usage patterns (Pirolli, col. 4, lines 34-35), which suggests creating and storing a 
value that is the function of the amount of time a user spent viewing a document associated with 
a particular session. Therefore, it would have been obvious to one of ordinary skill in the art to 
have extended Pirolli to have computed element values as a function of a time that a user has 
spent viewing a document associated with each element. 

Regarding dependent claim 31, Pirolli discloses creating and storing a second matrix 
representing a Similarity between pairs of documents i and j wherein the second matrix is 
derived by comparing pairs of column vectors or row vectors respectively i and j of the first 
matrix inasmuch as Pirolli teaches generating three matrices representing similarity between 
documents (Pirolli. col. 10, lines 10-11; Figs. 9, 11, 13) from raw information entered in a first 

matrix (Pirolli, Fig. 5). 

Regarding dependent claim 32, Pirolli discloses creating and storing a second matrix 
representing a Similarity between pairs of documents i and j inasmuch as Pirolli teaches 
generating three matrices representing similarity between documents (Pirolli. col. 10, lines 10- 
11; Figs. 9, 1 1, 13) from raw information entered in a first matrix (Pirolli, Fig. 5). Pirolli does 
not disclose finding pairs of documents i and j which have high interest values for a particular 
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user in a particular session or period of time. However, Pirolli does teach creating and storing 
information about the number of times a document was requested within a given time period 
(Pirolli, col. 8, lines 24-25) and also suggests tracking user usage patterns (Pirolli, col. 4, lines 
34-35), which suggests comparing documents based on interest values for a particular user in a 
particular session of time. Therefore, it would have been obvious to one of ordinary skill in the 
art to have modified Pirolli to have created the second matrix as recited in claim 32. 

Regarding dependent claim 33, Pirolli discloses identifying a category of a classification 
taxonomy of the hypertext system in which a first electronic document is presently classified 
inasmuch as Pirolli states "for relevancy predictions, one or more Web pages for spreading 
activation are selected, step 105. The selected Web pages may be based on the category that it is 
in." (Pirolli, col. 5, lines 34-36.) 

Further, Pirolli discloses storing information that classifies the second electronic 
document into the category if the second electronic document is found to be highly Similar 
inasmuch as Pirolli states that "activation is spread using the selected page as a focal point to 
generate a list of relevant pages, step 106." (Pirolli, col. 5, lines 40-42.) 

Regarding independent claim 34, Pirolli discloses a computer-readable medium carrying 
one or more sequences of instructions. (Pirolli, col. 13, lines 24-27.) 

Further, the rejection of claim 1 above is fully incorporated herein. 
7. Claims 15-16 are rejected under 35 U.S.C. 103(a) as being unpatentable over Pirolli and 
Prasad in view of U.S. Patent Number 6,282,549 Bl to Hoffert et al. (hereinafter "Hoffert"), 
issued August 28, 2001, filed March 29, 1999. 



Application/Control Number: 09/333,121 Page 13 

Art Unit: 2176 

Regarding dependent claim 15, Pirolli does not disclose measures of similarity including 
multimedia similarity. Hoffert, however, teaches that the type of multimedia file is relevant to a 
user querying a database for multimedia files, and teaches the classification of multimedia files 
with associated icons indicating file type. (Hoffert, col. 23, lines 46-67.) Therefore, it would 
have been obvious to one of ordinary skill in the art to have modified Pirolli and Prasad to have 
measures of similarity include multimedia similarity. 

Regarding dependent claim 16, Pirolli does not disclose considering two documents 
similar based on features derived from multimedia components linked to or contained by the 
documents. Hoffert, however, teaches the storage of a variety of multimedia file features for the 
storage, retrieval, and classification of multimedia documents. (Hoffert, col. 6, lines 10-32.) 
Therefore, it would have been obvious to one of ordinary skill in the art to have modified Pirolli 
and Prasad to consider two documents similar based on features derived from multimedia 
components linked to or contained by the documents. 

8. Claim 20 is rejected under 35 U.S.C. 103(a) as being unpatentable over Pirolli and 
Prasad in view of U.S. Patent Number 6,128,606 to Bengio et al. (hereinafter "Bengio"), issued 
October 3, 2000, filed March 11, 1997. 

Regarding dependent claim 20, Pirolli does not disclose obtaining structural information 
by optimizing an objective function. However, Bengio, in disclosing an invention "directed to 
the problem of developing a modular building block for complex processes that can input and 
output data in a wide variety of forms, but when interconnected with other similar modular 
building blocks can be easily trained" (Bengio, col. 2, lines 45-49), teaches "training a network 
of these modules by back-propagating gradients through the network to determine a minimum of 
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the global objective function." (Bengio, col. 2, lines 57-60.) Because claim 20 is directed to a 
similar invention, it would have been obvious to one of ordinary skill in the art to have combined 
Pirolli, Prasad, and Bengio to implement the optimization of an objective function. 
9. Claims 21-25 are rejected under 35 U.S.C. 103(a) as being unpatentable over Pirolli and . 
Prasad in view of U.S. Patent Number 6,389,436 to Chakrabarti et al. (hereinafter 
"Chakrabarti"), issued May 14, 2002, filed December 15, 1997. 

Regarding dependent claim 21, Pirolli does not disclose obtaining structural information 
by approximately optimizing an objective function. Chakrabarti, however, in the context of a 
document classifier similar to the invention of claim 21, discloses optimizing an objective 
function by "relaxation labeling" in which "[t]he iteration continues until a stopping criteria is 
reached." (Chakrabarti, col. 19, lines 17-20.) Therefore, it would have been obvious to one of 
ordinary skill in the art to have combined Pirolli, Prasad, and Chakrabarti to have obtained 
structural information by approximately optimizing an objective function. 

Regarding dependent claim 22, neither Pirolli nor Chakrabarti discloses repeated 
application of a growth transformation. However, given that a growth function is one which by 
definition stabilizes in a finite number of steps, it would have been obvious for one of ordinary 
skill in the art to have extended the combination of Pirolli, Prasad, and Chakrabarti to repeatedly 
apply a growth transformation. 

Regarding dependent claim 23, Pirolli does not disclose creating and storing a second 
matrix that represents an interim score for each document in each category. However, 
Chakrabarti teaches a technique of soft classification in which "after each iteration, all 
documents are assigned a vector containing estimated probabilities of belonging to each class." 
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(Chakrabarti, col. 19, lines 27-29.) Therefore, it would have been obvious to one of ordinary 
skill in the art to have combined Pirolli, Prasad, and Chakrabarti to have created and stored a 
second matrix that represents an interim score for each document in each category. 

Regarding dependent claim 24, Pirolli does not disclose periodically normalizing the 
rows of the matrix by normalizing within each document, across all categories, whereby the 
score for one document in a particular category will depend on the scores for that document in 
other categories. However, Chakrabarti suggests such a step inasmuch as Chakrabarti teaches 
word vectors containing probabilities in which the score for one document in a particular 
category inherently depends on the scores for that document in all other categories. 
(Chakrabarti, col. 6, lines 46-65; col. 19, lines 27-29.) Therefore, it would have been obvious to 
one of ordinary skill in the art to have combined Pirolli, Prasad and Chakrabarti to have 
periodically normalized the rows of the matrix as recited in claim 24. 

Regarding dependent claim 25, Pirolli does not disclose periodically, as the matrix is 
being computed, normalizing columns of the matrix by normalizing within each category, across 
all documents, whereby the score for one document in a particular category depends on the 
scores for all other documents in that category. However, Chakrabarti suggests such a step 
inasmuch as Chakrabarti teaches word vectors containing probabilities in which the score for one 
document in a particular category inherently depends on the scores for that document in all other 
categories. (Chakrabarti, col. 6, lines 46-65; col. 19, lines 27-29.) Therefore, it would have been 
obvious to one of ordinary skill in the art to have combined Pirolli, Prasad and Chakrabarti to 
have periodically normalized the columns of the matrix as recited in claim 25. 
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Response to Arguments 

10. Applicants' argument that their amendment has cured the rejections of claims 19-25 and 
28-32 under 35 U.S.C 1 12 is persuasive, and accordingly those rejections have not been 
maintained in this Office action. 

1 1 . Applicant's arguments with respect to the rejection of claims 1 and 34 as anticipated by 
Pirolli have been considered but are moot in view of the new ground(s) of rejection. 

Conclusion 

12. Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Charles A. Bieneman whose telephone number is 703-305-8045. 
The examiner can normally be reached on Monday - Thursday, 7:00 a.m. - 5:30 p.m.. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Heather Herndon can be reached on 703-308-5186. The fax phone numbers for the 
organization where this application or proceeding is assigned are 703-746-7239 for regular 
communications and 703-746-7238 for After Final communications. 

Any inquiry of a general nature or relating to the status of this application or proceeding 
should be directed to the receptionist whose telephone number is 703-305-4700. 




CAB 

January 22, 2003 , r 



